{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Down the rabbit hole with Tensorflow\n",
    "\n",
    "In this seminar, we're going to play with [Tensorflow](https://www.tensorflow.org/) and see how it helps you build deep learning models.\n",
    "\n",
    "If you're running this notebook outside the course environment, you'll need to install Tensorflow v1:\n",
    "* `pip install tensorflow==1.15.2` should install CPU-only TF on Linux & Mac OS\n",
    "* If you want GPU support from the onset, see the [TF install page](https://www.tensorflow.org/install/). `pip install tensorflow-gpu==1.15.2` might or might not work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys, os\n",
    "if 'google.colab' in sys.modules:\n",
    "    %tensorflow_version 1.x\n",
    "    \n",
    "    if not os.path.exists('.setup_complete'):\n",
    "        !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/master/setup_colab.sh -O- | bash\n",
    "\n",
    "        !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/master/week04_[recap]_deep_learning/mnist.py\n",
    "\n",
    "        !touch .setup_complete\n",
    "\n",
    "# This code creates a virtual display to draw game images on.\n",
    "# It will have no effect if your machine has a monitor.\n",
    "if type(os.environ.get(\"DISPLAY\")) is not str or len(os.environ.get(\"DISPLAY\")) == 0:\n",
    "    !bash ../xvfb start\n",
    "    os.environ['DISPLAY'] = ':1'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "# session is main tensorflow object. You ask session to compute stuff for you.\n",
    "sess = tf.InteractiveSession()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Warming up\n",
    "For starters, let's implement a python function that computes the sum of squares of numbers from 0 to N-1.\n",
    "* Use numpy or python\n",
    "* An array of numbers 0 to N - numpy.arange(N)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sum_squares(N):\n",
    "    return <YOUR CODE: student.implement_me()>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "sum_squares(10**8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Same with tensorflow__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# \"i will insert N here later\"\n",
    "N = tf.placeholder('int64', name=\"input_to_your_function\")\n",
    "\n",
    "# a recipe on how to produce {sum of squares of arange of N} given N\n",
    "result = tf.reduce_sum((tf.range(N)**2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "# dear session, compute the result please. Here's your N.\n",
    "print(sess.run(result, {N: 10**8}))\n",
    "\n",
    "# hint: run it several times to let tensorflow \"warm up\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How it works: computation graphs\n",
    "\n",
    "\n",
    "1. create placeholders for future inputs;\n",
    "2. define symbolic graph: a recipe for mathematical transformation of those placeholders;\n",
    "3. compute outputs of your graph with particular values for each placeholder\n",
    "  * ```sess.run(outputs, {placeholder1:value1, placeholder2:value2})```\n",
    "  * OR output.eval({placeholder:value}) \n",
    "\n",
    "Still confused? We gonna fix that."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Placeholders and constants__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# placeholder that can be arbitrary float32 scalar, vertor, matrix, etc.\n",
    "arbitrary_input = tf.placeholder('float32')\n",
    "\n",
    "# input vector of arbitrary length\n",
    "input_vector = tf.placeholder('float32', shape=(None,))\n",
    "\n",
    "# input vector that _must_ have 10 elements and integer type\n",
    "fixed_vector = tf.placeholder('int32', shape=(10,))\n",
    "\n",
    "# you can generally use None whenever you don't need a specific shape\n",
    "input1 = tf.placeholder('float64', shape=(None, 100, None))\n",
    "input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can create new __tensors__ with arbitrary operations on placeholders, constants and other tensors.\n",
    "\n",
    "* tf.reduce_sum(tf.arange(N)\\**2) are 3 sequential transformations of placeholder N\n",
    "* there's a tensorflow symbolic version for every numpy function\n",
    "  * `a + b, a / b, a ** b, ...` behave just like in numpy\n",
    "  * np.zeros -> tf.zeros\n",
    "  * np.sin -> tf.sin\n",
    "  * np.mean -> tf.reduce_mean\n",
    "  * np.arange -> tf.range\n",
    "  \n",
    "There are tons of other stuff in tensorflow, see the [docs](https://www.tensorflow.org/api_docs/python) or learn as you go with __shift+tab__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# elementwise multiplication\n",
    "double_the_vector = input_vector * 2\n",
    "\n",
    "# elementwise cosine\n",
    "elementwise_cosine = tf.cos(input_vector)\n",
    "\n",
    "# elementwise difference between squared vector and it's means - with some random salt\n",
    "vector_squares = input_vector ** 2 - \\\n",
    "    tf.reduce_mean(input_vector) + tf.random_normal(tf.shape(input_vector))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practice 1: polar pretzels\n",
    "_inspired by [this post](https://www.quora.com/What-are-the-most-interesting-equation-plots)_\n",
    "\n",
    "There are some simple mathematical functions with cool plots. For one, consider this:\n",
    "\n",
    "$$ x(t) = t - 1.5 * cos( 15 t) $$\n",
    "$$ y(t) = t - 1.5 * sin( 16 t) $$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t = tf.placeholder('float32')\n",
    "\n",
    "\n",
    "# compute x(t) and y(t) as defined above.\n",
    "x = <YOUR CODE>\n",
    "y = <YOUR CODE>\n",
    "\n",
    "\n",
    "x_points, y_points = sess.run([x, y], {t: np.linspace(-10, 10, num=10000)})\n",
    "plt.plot(x_points, y_points)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizing graphs with Tensorboard\n",
    "\n",
    "It's often useful to visualize the computation graph when debugging or optimizing. \n",
    "Interactive visualization is where tensorflow really shines as compared to other frameworks. \n",
    "\n",
    "There's a special instrument for that, called Tensorboard. You can launch it from console:\n",
    "\n",
    "__```tensorboard --logdir=/tmp/tboard --port=7007```__\n",
    "\n",
    "If you're pathologically afraid of consoles, try this:\n",
    "\n",
    "__```import os; os.system(\"tensorboard --logdir=/tmp/tboard --port=7007 &\")```__\n",
    "\n",
    "_(but don't tell anyone we taught you that)_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One basic functionality of tensorboard is drawing graphs. One you've run the cell above, go to `localhost:7007` in your browser and switch to _graphs_ tab in the topbar. \n",
    "\n",
    "Here's what you should see:\n",
    "\n",
    "<img src=\"https://www.tensorflow.org/images/graph_vis_animation.gif\" width=480>\n",
    "\n",
    "Tensorboard also allows you to draw graphs (e.g. learning curves), record images & audio ~~and play flash games~~. This is useful when monitoring learning progress and catching some training issues.\n",
    "\n",
    "One researcher said:\n",
    "```\n",
    "If you spent last four hours of your worktime watching as your algorithm prints numbers and draws figures, you're probably doing deep learning wrong.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can read more on tensorboard usage [here](https://www.tensorflow.org/get_started/graph_viz)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practice 2: mean squared error\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Quest #1 - implement a function that computes a mean squared error of two input vectors\n",
    "# Your function has to take 2 vectors and return a single number\n",
    "\n",
    "<YOUR CODE: student.define_inputs_and_transformations()>\n",
    "\n",
    "mse = <YOUR CODE: student.define_transformation()>\n",
    "\n",
    "compute_mse = lambda vector1, vector2: sess.run( <YOUR CODE: how to run your graph?> , {})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Tests\n",
    "from sklearn.metrics import mean_squared_error\n",
    "\n",
    "for n in [1, 5, 10, 10 ** 3]:\n",
    "\n",
    "    elems = [np.arange(n), np.arange(n, 0, -1), np.zeros(n),\n",
    "             np.ones(n), np.random.random(n), np.random.randint(100, size=n)]\n",
    "\n",
    "    for el in elems:\n",
    "        for el_2 in elems:\n",
    "            true_mse = np.array(mean_squared_error(el, el_2))\n",
    "            my_mse = compute_mse(el, el_2)\n",
    "            if not np.allclose(true_mse, my_mse):\n",
    "                print('Wrong result:')\n",
    "                print('mse(%s,%s)' % (el, el_2))\n",
    "                print(\"should be: %f, but your function returned %f\" %\n",
    "                      (true_mse, my_mse))\n",
    "                raise ValueError, \"Что-то не так\"\n",
    "\n",
    "print(\"All tests passed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tensorflow variables\n",
    "\n",
    "The inputs and transformations have no value outside function call. That's a bit unnatural if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.\n",
    "\n",
    "Tensorflow solves this with `tf.Variable` objects.\n",
    "* You can assign variable a value at any time in your graph\n",
    "* Unlike placeholders, there's no need to explicitly pass values to variables when `sess.run(...)`-ing\n",
    "* You can use variables the same way you use transformations \n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# creating shared variable\n",
    "shared_vector_1 = tf.Variable(initial_value=np.ones(5))\n",
    "\n",
    "# initialize all variables with initial values\n",
    "sess.run(tf.global_variables_initializer())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# evaluating shared variable (outside symbolicd graph)\n",
    "print(\"initial value\", sess.run(shared_vector_1))\n",
    "\n",
    "# within symbolic graph you use them just as any other inout or transformation, not \"get value\" needed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# setting new value manually\n",
    "sess.run(shared_vector_1.assign(np.arange(5)))\n",
    "\n",
    "# getting that new value\n",
    "print(\"new value\", sess.run(shared_vector_1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# tf.gradients - why graphs matter\n",
    "* Tensorflow can compute derivatives and gradients automatically using the computation graph\n",
    "* Gradients are computed as a product of elementary derivatives via chain rule:\n",
    "\n",
    "$$ {\\partial f(g(x)) \\over \\partial x} = {\\partial f(g(x)) \\over \\partial g(x)}\\cdot {\\partial g(x) \\over \\partial x} $$\n",
    "\n",
    "It can get you the derivative of any graph as long as it knows how to differentiate elementary operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_scalar = tf.placeholder('float32')\n",
    "\n",
    "scalar_squared = my_scalar ** 2\n",
    "\n",
    "# a derivative of scalar_squared by my_scalar\n",
    "derivative = tf.gradients(scalar_squared, [my_scalar])[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(-3, 3)\n",
    "x_squared, x_squared_der = sess.run(\n",
    "    [scalar_squared, derivative], {my_scalar: x})\n",
    "\n",
    "plt.plot(x, x_squared, label=\"x^2\")\n",
    "plt.plot(x, x_squared_der, label=\"derivative\")\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why autograd is cool"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_vector = tf.placeholder('float32', [None])\n",
    "\n",
    "# Compute the gradient of the next weird function over my_scalar and my_vector\n",
    "# warning! Trying to understand the meaning of that function may result in permanent brain damage\n",
    "\n",
    "weird_psychotic_function = tf.reduce_mean((my_vector+my_scalar)**(1+tf.nn.moments(my_vector, [0])[1]) + 1. / tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(\n",
    "    2*my_scalar**1.5)*(tf.reduce_sum(my_vector) * my_scalar**2)*tf.exp((my_scalar-4)**2)/(1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2))/(1+tf.exp(-(my_scalar-4)**2)))**2\n",
    "\n",
    "der_by_scalar = <YOUR CODE: student.compute_grad_over_scalar()>\n",
    "der_by_vector = <YOUR CODE: student.compute_grad_over_vector()>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting your derivative\n",
    "scalar_space = np.linspace(1, 7, 100)\n",
    "\n",
    "y = [\n",
    "    sess.run(weird_psychotic_function, {my_scalar: x, my_vector: [1, 2, 3]})\n",
    "    for x in scalar_space]\n",
    "\n",
    "plt.plot(scalar_space, y, label='function')\n",
    "\n",
    "y_der_by_scalar = [\n",
    "    sess.run(der_by_scalar, {my_scalar: x, my_vector: [1, 2, 3]})\n",
    "    for x in scalar_space]\n",
    "\n",
    "plt.plot(scalar_space, y_der_by_scalar, label='derivative')\n",
    "plt.grid()\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Almost done - optimizers\n",
    "\n",
    "While you can perform gradient descent by hand with automatic grads from above, tensorflow also has some optimization methods implemented for you. Recall momentum & rmsprop?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_guess = tf.Variable(np.zeros(2, dtype='float32'))\n",
    "y_true = tf.range(1, 3, dtype='float32')\n",
    "\n",
    "loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)\n",
    "\n",
    "optimizer = tf.train.MomentumOptimizer(\n",
    "    0.01, 0.9).minimize(loss, var_list=y_guess)\n",
    "\n",
    "# same, but more detailed:\n",
    "# updates = [[tf.gradients(loss,y_guess)[0], y_guess]]\n",
    "# optimizer = tf.train.MomentumOptimizer(0.01,0.9).apply_gradients(updates)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import clear_output\n",
    "\n",
    "sess.run(tf.global_variables_initializer())\n",
    "\n",
    "guesses = [sess.run(y_guess)]\n",
    "\n",
    "for _ in range(100):\n",
    "    sess.run(optimizer)\n",
    "    guesses.append(sess.run(y_guess))\n",
    "\n",
    "    clear_output(True)\n",
    "    plt.plot(*zip(*guesses), marker='.')\n",
    "    plt.scatter(*sess.run(y_true), c='red')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic regression example\n",
    "Implement the regular logistic regression training algorithm\n",
    " \n",
    "We shall train on a two-class MNIST dataset. \n",
    "\n",
    "This is a binary classification problem, so we'll train a __Logistic Regression with sigmoid__.\n",
    "$$P(y_i | X_i) = \\sigma(W \\cdot X_i + b) ={ 1 \\over {1+e^{- [W \\cdot X_i + b]}} }$$\n",
    "\n",
    "\n",
    "The natural choice of loss function is to use binary crossentropy (aka logloss, negative llh):\n",
    "$$ L = {1 \\over N} \\underset{X_i,y_i} \\sum - [  y_i \\cdot log P(y_i | X_i) + (1-y_i) \\cdot log (1-P(y_i | X_i)) ]$$\n",
    "\n",
    "Mind the minus :)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "X, y = load_digits(2, return_X_y=True)\n",
    "\n",
    "print(\"y [shape - %s]:\" % (str(y.shape)), y[:10])\n",
    "print(\"X [shape - %s]:\" % (str(X.shape)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('X:\\n', X[:3, :10])\n",
    "print('y:\\n', y[:10])\n",
    "plt.imshow(X[0].reshape([8, 8]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# inputs and shareds\n",
    "weights = <YOUR CODE: student.create_variable()>\n",
    "input_X = <YOUR CODE: student.create_placeholder_matrix()>\n",
    "input_y = <YOUR CODE: student.code_placeholder_vector()>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predicted_y_proba = <YOUR CODE: predicted probabilities for input_X using weights>\n",
    "\n",
    "loss = <YOUR CODE: logistic loss(scalar, mean over sample) between predicted_y_proba and input_y>\n",
    "\n",
    "train_step = <YOUR CODE: operator that minimizes loss>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import roc_auc_score\n",
    "\n",
    "for i in range(5):\n",
    "\n",
    "    loss_i, _ = sess.run([loss, train_step], <YOUR CODE: feed values to placeholders>)\n",
    "\n",
    "    print(\"loss at iter %i: %.4f\" % (i, loss_i))\n",
    "\n",
    "    print(\"train auc:\", roc_auc_score(\n",
    "        y_train, sess.run(predicted_y_proba, {input_X: X_train})))\n",
    "    print(\"test auc:\", roc_auc_score(\n",
    "        y_test, sess.run(predicted_y_proba, {input_X: X_test})))\n",
    "\n",
    "\n",
    "print(\"resulting weights:\")\n",
    "plt.imshow(weights.get_value().reshape(8, -1))\n",
    "plt.colorbar();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Practice 3: my first tensorflow network\n",
    "Your ultimate task for this week is to build your first neural network [almost] from scratch and pure tensorflow.\n",
    "\n",
    "This time you will same digit recognition problem, but at a larger scale\n",
    "* images are now 28x28\n",
    "* 10 different digits\n",
    "* 50k samples\n",
    "\n",
    "Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) NN should already have ive you an edge over logistic regression.\n",
    "\n",
    "__[bonus score]__\n",
    "If you've already beaten logistic regression with a two-layer net, but enthusiasm still ain't gone, you can try improving the test accuracy even further! The milestones would be 95%/97.5%/98.5% accuraсy on test set.\n",
    "\n",
    "__SPOILER!__\n",
    "At the end of the notebook you will find a few tips and frequently made mistakes. If you feel enough might to shoot yourself in the foot without external assistance, we encourage you to do so, but if you encounter any unsurpassable issues, please do look there before mailing us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mnist import load_dataset\n",
    "\n",
    "# [down]loading the original MNIST dataset.\n",
    "# Please note that you should only train your NN on _train sample,\n",
    "#  _val can be used to evaluate out-of-sample error, compare models or perform early-stopping\n",
    "#  _test should be hidden under a rock untill final evaluation... But we both know it is near impossible to catch you evaluating on it.\n",
    "X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()\n",
    "\n",
    "print(X_train.shape, y_train.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(X_train[0, 0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "<this cell looks as if it wants you to create variables here>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "<you could just as well create a computation graph here - loss, optimizers, all that stuff>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "<this may or may not be a good place to run optimizer in a loop>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "<this may be a perfect cell to write a training & evaluation loop in>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "<predict & evaluate on test here, right? No cheating pls.>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "```\n",
    "\n",
    "\n",
    "# SPOILERS!\n",
    "\n",
    "Recommended pipeline\n",
    "\n",
    "* Adapt logistic regression from previous assignment to classify some number against others (e.g. zero vs nonzero)\n",
    "* Generalize it to multiclass logistic regression.\n",
    "  - Either try to remember lecture 0 or google it.\n",
    "  - Instead of weight vector you'll have to use matrix (feature_id x class_id)\n",
    "  - softmax (exp over sum of exps) can implemented manually or as T.nnet.softmax (stable)\n",
    "  - probably better to use STOCHASTIC gradient descent (minibatch)\n",
    "    - in which case sample should probably be shuffled (or use random subsamples on each iteration)\n",
    "* Add a hidden layer. Now your logistic regression uses hidden neurons instead of inputs.\n",
    "  - Hidden layer uses the same math as output layer (ex-logistic regression), but uses some nonlinearity (sigmoid) instead of softmax\n",
    "  - You need to train both layers, not just output layer :)\n",
    "  - Do not initialize layers with zeros (due to symmetry effects). A gaussian noize with small sigma will do.\n",
    "  - 50 hidden neurons and a sigmoid nonlinearity will do for a start. Many ways to improve. \n",
    "  - In ideal casae this totals to 2 .dot's, 1 softmax and 1 sigmoid\n",
    "  - __make sure this neural network works better than logistic regression__\n",
    "  \n",
    "* Now's the time to try improving the network. Consider layers (size, neuron count),  nonlinearities, optimization methods, initialization - whatever you want, but please avoid convolutions for now."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
